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^C^bstract 

O 

^^inary descriptors of image patches provide processing 
Sjjeed advantages and require less storage than methods 
*That encode the patch appearance with a vector of real 
^'Qnmbers. We provide evidence that, despite its simplic¬ 
ity, a stochastic hill climbing bit selection procedure for 
'■^scriptor construction defeats recently proposed alterna- 

5 res on a standard discriminative power benchmark. The 
ethod is easy to implement and understand, has no free 
parameters that need fine tuning, and runs fast. 
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^ Introduction 


^ocal image patch descriptors have become a widely used 
t!j)ol in computer vision, used for object/scene recognition 
image retrieval [18], face verification [22], face align- 
^^ent [27] and image stitching [4]. Their usefulness and 
.importance are proven by the large number of publica- 
.^ons that introduced different descriptors. Recently bi- 
^<^ry keypoint descriptors [5, 20, 12, 26, 1, 25, 29, 7] gained 
^nsiderable interest as they require less storage and pro¬ 
vide faster matching times compared to descriptors that 
encode the patch appearance as a vector of real numbers 
[14, 2, 10, 23]. 


2 Selecting discriminative bits 

A binary descriptor consists of b classifiers. Each classifier, 
denoted as a bit in further text, outputs a 0 or a 1 for a 
given image patch. Thus, a descriptor maps image patches 
into binary vectors which are used as signatures for search 
engines. The idea is to select individual bits in such a 
way that matching patches are ’’close” in the Hamming 
space and non-matching patches are ’’far”. Let AUC(d) 
denote the area under the receiver operating characteris¬ 
tics (ROC) curve for a descriptor d, measured on a set of 
matching and non-matching image patch pairs. The true 
positive rates (TPRs) and false positive rates (EPRs) are 
computed by thresholding the Hamming distance between 
signatures. Brown et al. [3] proposed to select the pa¬ 
rameters of real-valued descriptors (such as pooling region 
locations) by optimizing the AUC criterion. We apply this 
reasoning to select individual bits (i.e., dimensions) of a 
binary descriptor. Since the AUC criterion is not continu¬ 
ous, we apply stochastic hill climbing to achieve our goal. 
This also relates our paper to a large body of research 
in feature selection (for example, see [17, 9]). The whole 
procedure can be summarized by the following steps: 

1. Generate a pool P of R bits. 

2. Select b bits from P to obtain a descriptor d*. 

3. Iterate N times: 


In this paper, we investigate three bit selection proce¬ 
dures. Two of these have been recently used to construct 
discriminative local binary descriptors [24, 25, 29, 7]: the 
boosting- and correlation-based methods. We show that 
a simple heuristic combinatorial search algorithm outper¬ 
forms these approaches on a large discriminative power 
benchmark [3]. Additionally, we introduce a new descrip¬ 
tor based on binarised LBP features and compare it to 
competing approaches. 


(a) Swap a random bit from d* with a random bit 
from P to obtain d. 

(b) If AUC(P) < AUC(d), set d* = d. 

The number of iterations, N, is set to 4 • R as this led 
to good results in all our experiments. Note that the de¬ 
scribed procedure does not specify the exact method how 
to generate individual bits to obtain a pool P. The method 
could be taken, for example, from [7]. 


1 



Note that optimizing the AUC criterion is not new in the 
machine learning community [28]. Also, Lin et al. [13] used 
the AUC criterion and other ranking measures to learn 
binary descriptors for image retrieval. The main difference 
between our approach and similar previous work [3, 13] is 
that we eliminate the need for a complicated numerical 
solver by heavily exploiting randomization. We perform 
experimental validation of the proposed approach in the 
next sections. 

3 Comparison of bit selection meth¬ 
ods 

We compare the proposed procedure to the recent 
boosting- and correlation-based bit selection methods. 
The boosting-based methods [25, 29] use the principle of 
reweighting the training data during learning, inspired by 
AdaBoost [8]. The correlation-based method [7] sequen¬ 
tially selects accurate bits that have low correlation with 
other already selected bits. Each method has a continuous 
parameter that significantly influences performance; the 
boosting shrinkage coefficient and the correlation thresh¬ 
old. In our experiments, we select these parameters to 
maximize the accuracy on the test set. Note that the pro¬ 
posed descriptor construction procedure does not have free 
parameters that need to be fine-tuned. 

We use the dataset introduced by Brown et al. [3] to 
provide experimental evidence that the proposed bit se¬ 
lection procedure has practical value. We report the re¬ 
sults in terms of ROC curves and 95% error rates. The 
dataset consists of three subsets: Notre Dame, Liberty 
and Yosemite. Each contains a large number of 64 x 64 
rotation- and scale-normalized patches extracted around 
DoG keypoints [14]. The ground truth for each subset 
consists of 100k, 200k and 500k pairs of patches, 50% cor¬ 
respond to matching pairs and 50% to non-matching pairs. 
We use a simplified notation for the training and testing 
subsets in our experiments. Eor example, L/ND will de¬ 
note the scenario in which the Liberty subset patch pairs 
were used for descriptor learning and the Notre Dame sub¬ 
set patch pairs for testing. 

We compare the mentioned bit selection procedures on 
the task of improving the BRIEE descriptor [5]. The ba¬ 
sic idea of BRIEF is to construct a 256 bit descriptor 
by performing 256 pixel intensity comparison binary tests 
(”7a;i,j/i < Ix 2 ,y 2 ^-''') ou an incoming image patch. The pixel 
sampling locations are fixed in runtime. It is intuitive that 


Selection type 

Correlation [7] 

Boosting [25, 29] 

Proposed 

Time [s] 

3281 

927 

175 


Table 2; Processing times for selecting 256 out of 1024 bits 
for a dataset of 100k patch pairs. 


the distribution of these locations matters for accuracy. 
The authors of the original paper [5] propose to pick them 
at random. Here we show that the accuracy of the de¬ 
scriptor improves if we carefully select 256 tests out of 
1024 random ones. This is done by comparing boosting-, 
correlation- or stochastic hill climbing-based bit selection 
procedures on training and test sets from [3]. Figure 1 
shows the resulting ROC curves, and Table 1 shows the 
95% error rates. 

We can see that the proposed method leads to lower er¬ 
ror rates in the majority of scenarios. Also, its behavior 
is consistent, i.e., it obtains good results for all train/test 
pairs, which is not the case for the competing approaches. 
Bit selection processing times can be seen in Table 2. 
The parameters of all methods were fixed. Thus, this is 
not an entirely fair comparison since both boosting- and 
correlation-based approaches typically require several re¬ 
runs for parameter fine-tuning. 

The proposed procedure also achieves better results 
when individual bits are based on binarised gradients 
[25, 7]. We omit these results for brevity. 

In the next section we introduce our new keypoint de¬ 
scriptor (generated with the proposed stochastic hill climb¬ 
ing bit selection procedure) and compare it to the state- 
of-the-art on the problem of visual object search. 

4 Binarising LBP histograms 

Local binary patterns (LBPs) are a well known computer 
vision tool. Heikkila et al. [10] showed that their his¬ 
tograms can be used for keypoint description (the au¬ 
thors report results that were superior to SIFT). First, 
we use their procedure to extract a real-valued vector 
V = {vi,V 2 , ■ ■ ■ ,Vn) G from each image patch. Next, 
we generate a large number of bits of the following form: 

1, Vi > Vj 
0, otherwise 

(the tuple of two indices, {i,j), specifies the bit; Vi and Vj 
are the f-th and j-th components of v). Finally, we run the 
proposed stochastic hill climbing procedure on the dataset 
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Figure 1: The ROC curves of the improved BRIEF descriptors on the testing subsets from [3]. 

































































































































































































Training subset 

Testing subset 

FPR at 95% TPR 

Random [5] 

Boosting [25, 29] 

Correlation [7] 

Proposed 

L 

Y 

ND 

55.71 ±0.91 

47.89 

44.69 

43.62 

41.21 

39.26 ± 0.29 

39.24 ± 0.32 

ND 

T 

61.34 ±0.70 

46.49 

51.11 

47.05 ± 0.26 

Y 

Lj 

49.92 

51.21 

47.89 ± 0.42 

ND 

L 

Y 

60.94 ± 1.11 

45.51 

51.58 

48.92 

49.43 

45.85 ± 0.25 

47.46 ± 0.24 


Table 1: Accuracy results for the improved BRIEF descriptor on subsets from [3]. The mean and standard deviation 
for methods that involve randomization were computed from 10 runs. 


of Brown et al. [3] to select 256 discriminative bits which 
will form our binarised LBP descriptor (bLBP). 

In the next subsection we compare the proposed descrip¬ 
tor with recently introduced competition. 

4.1 Experiments in image retrieval 

Following [7], we implemented a simple visual search sys¬ 
tem to compare the discriminative power of different de¬ 
scriptors. Each image is represented with 75 FAST [19] 
keypoints. The retrieval is based on the number of match¬ 
ing keypoints between the query and database images. 
Whether two keypoints match or not is determined by 
thresholding the Hamming distance of their descriptors. 
The threshold is fine-tuned for each descriptor type to give 
best retrieval results on a given database. We use the fol¬ 
lowing databases: 

• UKB [16] (first 300 objects, 4 views each) 

• ZuBuD [21] (200 buildings, 5 images each) 

• COIL-100 [15] (100 objects, first 32 views each) 

• INRIA Holidays [11] (approximately 1500 images of 
500 different scenes) 

For each image in the current database we search for the k 
most similar ones among the remaining images. We report 
the average precision at these top k results as retrieval 
accuracy. The integer k is specific for each database. For 
example, we use k = 3 for UKB since it contains 4 views 
per object and k = 1 for INRIA Holidays since there is a 
variable number of images for each scene. 

Table 3 summarizes the obtained retrieval results (we 
used the descriptor extraction code provided by the au¬ 
thors to have a fair comparison). We can see that the 
proposed descriptor outperforms all competing approaches 
with size of 256 bits. Also, its results are comparable or 


Descriptor 

Time [ms] 

bLBP 

5.1 

RFD-G [7] 

243 

RFD-R [7] 

30.5 

BinBoost256 [25] 

27.6 

LDB [29] 

0.43 

BRIEF [5] 

0.1 


Table 4: Time in miliseconds needed to extract descriptors 
from 75 keypoints for different approaches. 

better to the ones obtained by RFD [7], which requires 
more storage (320 or 448 bits). Table 4 shows the process¬ 
ing speed obtained by each of the methods. 

5 Conclusion 

We have shown that a simple stochastic hill climbing 
bit selection procedure outperforms recent alternatives 
[25, 29, 7] on a standard dataset [3]. We also introduced 
a new binary desciptor based on binarised LBP features 
that achieved good results in terms of accuracy and pro¬ 
cessing speed when compared to competing approaches. 
The source code is available at http://public.tel.fer. 
hr/bitslkt/. 
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